Skip to content

Replace websave with curl-based downloads to prevent corruption#746

Merged
stevewds merged 6 commits intomainfrom
claude/fix-linux-compression-issue-JkWEX
Apr 18, 2026
Merged

Replace websave with curl-based downloads to prevent corruption#746
stevewds merged 6 commits intomainfrom
claude/fix-linux-compression-issue-JkWEX

Conversation

@stevevanhooser
Copy link
Copy Markdown
Contributor

Summary

This PR replaces websave with curl-based downloads across three download modules to prevent HTTP content-encoding corruption. The websave function automatically decompresses HTTP responses, which can corrupt binary files when the gateway applies compression at the HTTP level.

Key Changes

  • downloadDocumentCollection.m: Replaced websave with ndi.cloud.api.files.getFile(..., 'useCurl', true) and improved error handling with retry logic that captures the last error message
  • downloadDatasetFiles.m: Replaced websave with curl-based getFile call with proper error handling
  • downloadGenericFiles.m: Replaced websave with curl-based getFile call with proper error handling
  • Updated documentation in downloadDocumentCollection.m to clarify that the Timeout option applies to the download operation generally (not just websave)

Implementation Details

  • All three modules now use the same ndi.cloud.api.files.getFile() function with the 'useCurl' parameter set to true
  • Error handling checks the success_d return value and converts error responses to strings for consistent error messaging
  • In downloadDocumentCollection.m, the retry loop now properly tracks the last error message (lastErr) to report in the final timeout error, fixing a potential bug where ME.message could be undefined
  • Minor formatting fix: added newline at end of downloadDocumentCollection.m file

https://claude.ai/code/session_01HMnM1qnDBgGdjSqSnjTfsV

stevevanhooser and others added 6 commits April 18, 2026 10:13
Gateway-level HTTP compression can corrupt binary/zip downloads when
MATLAB's websave auto-decompresses content-encoded responses, causing
"not a tar file" / invalid archive errors on Linux (Mac is unaffected).
Route the download through ndi.cloud.api.files.getFile with useCurl=true,
matching the pattern already used by didsqlite.do_openbinarydoc.
Gateway HTTP compression corrupts binary downloads when websave
auto-decompresses the response. Route through getFile with useCurl=true.
Gateway HTTP compression corrupts binary downloads when websave
auto-decompresses the response. Route through getFile with useCurl=true.
The API gateway serves binary artifacts (.zip for bulk document downloads,
.nbf.tgz for epoch binaries) with Content-Encoding: gzip. websave was
transparently decoding these on Mac, which masked the issue there;
plain `curl -L -o` does not decode and writes the compressed bytes to
disk, producing "invalid ZIP file" / "not a tar file" errors. Adding
--compressed makes curl advertise Accept-Encoding and decompress the
response, matching websave's prior behavior.
Our cloud payloads are already compressed archives (.zip, .nbf.tgz).
Layering HTTP-level gzip on top has produced corrupt archives on both
Mac (stream unzip fails) and Linux (untar fails) even with
--compressed, likely because streaming decoders misbehave on
already-compressed content. Explicitly request identity encoding so
the gateway delivers the file bytes as-is. Add -f so curl exits
non-zero on HTTP errors instead of writing an error body to the
destination file (which would then masquerade as a corrupt archive).
@stevewds stevewds merged commit b483006 into main Apr 18, 2026
@stevevanhooser stevevanhooser deleted the claude/fix-linux-compression-issue-JkWEX branch April 18, 2026 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants